Automatic Procedures in Tectogrammatical Tagging
نویسنده
چکیده
A semi-automatic syntactic annotation of a part of the Czech National Corpus in the Prague Dependency Treebank (PDT) has among its aims the possibility to check the theoretical approach chosen (Functional Generative Description, see [2]). While the first phases of the annotation of PDT, i.e. the morphemic representations and the dependency trees on an intermediate analytic level, i.e. analytic tree structures (ATSs, see [1]) have been discussed elsewhere, the present paper is devoted to the second, basic phase, the transduction from AL to (underlying) syntax itself, i.e. to tectogrammatical representations, which should be provided for 10 000 sentences during the year 2000 (at its start, 100 000 sentences have obtained their ATS annotations).
منابع مشابه
Syntactic Tagging: Procedure for the Transition from the Analytic to the Tectogrammatical Tree Structures
The syntactic tagging of the Prague Dependency Treebank (PDT) is divide into two steps, the rst resulting in analytic tree structures (ATS) and the second in tectogrammatical tree structures (TGTS). The present paper describes the transition procedures, automatic and manual, from ATS to TGTS and illustrates these procedures on two Czech sentences. Syntactic tagging in The Prague Dependency Tree...
متن کاملCoreferential Relations In The Prague Dependency Treebank
The approach to corpus annotation of PDT is performed in several levels and steps. The annotation of coreference relations is carried out on underlying (tectogrammatical) tree structures assigned to the sentences in the text on independent (and theoretically based) grounds, which makes it possible to systematically include into the annotation the superficially “null“ (unrealized) anaphors and o...
متن کاملCzech-English Dependency-based Machine Translation
We present some preliminary results of a Czech-English translation system based on dependency trees. The fully automated process includes: morphological tagging, analytical and tectogrammatical parsing of Czech, tectogrammatical transfer based on lexical substitution using word-to-word translation dictionaries enhanced by the information from the English-Czech parallel corpus of WSJ, and a simp...
متن کاملEnglish-Czech Machine Translation Using TectoMT
English to Czech machine translation as it is implemented in the TectoMT system consists of three phases: analysis, transfer and synthesis. The system uses tectogrammatical (deep-syntactic dependency) trees as the transfer medium. Each phase is divided into so-called blocks, which are processing units that solve linguistically interpretable tasks (e.g., statistical part-of-speech tagging or rul...
متن کاملPrague Dependency Treebank: From analytic to tectogrammatical annotations
The Prague Dependency Treebank is conceived of as an annotated corpus of written Czech, comprising three layers of annotations. In the present paper, we focus on a more detailed description of the structure and contents of the tectogrammatical syntactic trees (underlying sentence representations) and a specification of the transition from the analytic syntactic tree to the tectogrammatical one....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Prague Bull. Math. Linguistics
دوره 76 شماره
صفحات -
تاریخ انتشار 2001